Accelerated Clustering Through Locality - Sensitive Hashing by Shaunak Kishore

نویسندگان

Shaunak Kishore

Dennis Freeman

چکیده

We obtain improved running times for two algorithms for clustering data: the expectationmaximization (EM) algorithm and Lloyd's algorithm. The EM algorithm is a heuristic for finding a mixture of k normal distributions in Rd that maximizes the probability of drawing n given data points. Lloyd's algorithm is a special case of this algorithm in which the covariance matrix of each normally-distributed component is required to be the identity. We consider versions of these algorithms where the number of mixture components is inferred by assuming a Dirichlet process as a generative model. The separation probability of this process, a, is typically a small constant. We speed up each iteration of the EM algorithm from O(nd2 k) to O(ndk log 3(k/a))+nd 2 ) time and each iteration of Lloyd's algorithm from O(ndk) to O(nd(k/a). 39) time. Thesis Supervisor: Jonathan A. Kelner Title: Assistant Professor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing

MOTIVATION Similarity searching and clustering of chemical compounds by structural similarities are important computational approaches for identifying drug-like small molecules. Most algorithms available for these tasks are limited by their speed and scalability, and cannot handle today's large compound databases with several million entries. RESULTS In this article, we introduce a new algori...

متن کامل

Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering

Large scale agglomerative clustering is hindered by computational burdens. We propose a novel scheme where exact inter-instance distance calculation is replaced by the Hamming distance between Kernelized Locality-Sensitive Hashing (KLSH) hashed values. This results in a method that drastically decreases computation time. Additionally, we take advantage of certain labeled data points via distanc...

متن کامل

Locality Sensitive K-means Clustering

متن کامل

Hierarchical clustering of large text datasets using Locality-Sensitive Hashing

In this paper, we present a hierarchical clustering algorithm of the large text datasets using Locality-Sensitive Hashing (LSH). The main idea of the LSH is to “hash” items several times, in such a way that similar items are more likely to be hashed to the same bucket than dissimilar are. The main drawback of the conventional hierarchical algorithms is a large time complexity (e.g. Single Linka...

متن کامل

High-Throughput, Web-Scale Data Stream Clustering

Clustering is an important technique for analysing and interpreting massive quantities of data present on the web. However the sheer volume of data, along with its often dynamic and fast changing nature provide a challenge for traditional clustering approaches. We present a parallel clustering system specifically designed for continuous, real-time clustering of web-scale message data streams. A...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Accelerated Clustering Through Locality - Sensitive Hashing by Shaunak Kishore

نویسندگان

چکیده

منابع مشابه

Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing

Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering

Locality Sensitive K-means Clustering

Hierarchical clustering of large text datasets using Locality-Sensitive Hashing

High-Throughput, Web-Scale Data Stream Clustering

عنوان ژورنال:

اشتراک گذاری